Skip to content

Conversation

@julienrbrt
Copy link
Member

Overview

Attempt to replace go-header default store by ev-node store.
This avoids duplication of blocks.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 30, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedFeb 2, 2026, 1:08 PM

@claude
Copy link
Contributor

claude bot commented Jan 30, 2026

Claude finished @julienrbrt's task in 2m 18s —— View job


Comprehensive Code Review

I've completed a thorough analysis of this PR that refactors the store layer to replace go-header's default store with ev-node's unified store. Overall, this is a well-architected refactoring that successfully eliminates data duplication. However, I've identified several important issues and areas for improvement.


Critical Issues

1. Race Condition in heightSub.Wait() - CRITICAL

Location: pkg/store/store_adapter.go:65-89

The Wait() method has a double-check locking pattern that's vulnerable to race conditions:

func (hs *heightSub) Wait(ctx context.Context, height uint64) error {
    // Fast path: height already reached
    if hs.height.Load() >= height {
        return errElapsedHeight
    }

    hs.heightMu.Lock()
    // Double-check after acquiring lock
    if hs.height.Load() >= height {
        hs.heightMu.Unlock()
        return errElapsedHeight
    }

    // Create a channel to wait on
    ch := make(chan struct{})
    hs.heightChs[height] = append(hs.heightChs[height], ch)
    hs.heightMu.Unlock()  // ⚠️ Lock released before select

Problem: Between releasing the lock and entering the select, another goroutine could call SetHeight() and notify the channel, causing the close notification to be lost. The waiting goroutine would then block indefinitely.

Impact: Deadlocks in P2P sync operations when multiple goroutines wait for the same height.

Fix race condition →


2. Unbounded Memory Growth in Concurrent Height Waiters

Location: pkg/store/store_adapter.go:38-104

The heightChs map can grow unbounded if many goroutines wait for heights that never arrive:

type heightSub struct {
    height    atomic.Uint64
    heightMu  sync.Mutex
    heightChs map[uint64][]chan struct{}  // ⚠️ No cleanup mechanism
}

Problem: If a height is requested but never stored (due to P2P failures, context cancellation, or bugs), the channels remain in the map forever, causing memory leaks.

Impact: Memory exhaustion over time, especially in long-running nodes or during P2P sync issues.

Recommendation: Implement cleanup on context cancellation:

defer func() {
    hs.heightMu.Lock()
    // Remove this channel from the waiters list
    if chs, ok := hs.heightChs[height]; ok {
        for i, c := range chs {
            if c == ch {
                hs.heightChs[height] = append(chs[:i], chs[i+1:]...)
                break
            }
        }
    }
    hs.heightMu.Unlock()
}()

Fix memory leak →


High Priority Issues

3. Inconsistent Error Handling in GetByHeight()

Location: pkg/store/store_adapter.go:294-310

func (a *StoreAdapter[H]) GetByHeight(ctx context.Context, height uint64) (H, error) {
    var zero H

    // Try to get the item first
    if item, err := a.getByHeightNoWait(ctx, height); err == nil {
        return item, nil
    }

    // If not found, wait for the height to be stored
    err := a.heightSub.Wait(ctx, height)
    if err != nil && !errors.Is(err, errElapsedHeight) {  // ⚠️
        return zero, err
    }

    // Try again after waiting
    return a.getByHeightNoWait(ctx, height)
}

Problem: The function treats errElapsedHeight as success, but this means the second getByHeightNoWait() call is always executed even when Wait() returns an error. If the height still doesn't exist, this returns header.ErrNotFound instead of the original error from Wait().

Impact: Context cancellation errors are masked, making debugging difficult.

Recommendation:

if err != nil {
    if !errors.Is(err, errElapsedHeight) {
        return zero, err
    }
    // errElapsedHeight is fine, continue to fetch
}

4. O(n) Performance in Tail() for Pruned Nodes

Location: pkg/store/store_adapter.go:234-270

As noted in the TODO and review comments, the Tail() method performs a linear scan:

// Walk up from genesisInitialHeight to find the first available item (pruning case)
for h := a.genesisInitialHeight + 1; h <= height; h++ {
    item, err = a.getter.GetByHeight(ctx, h)
    if err == nil {
        return item, nil
    }
    // ...
}

Problem: For a pruned node with initial height 1 and actual tail at height 1M, this makes 1M queries.

Impact: Severe performance degradation for pruned nodes during P2P operations that call Tail().

Note: The PR author acknowledges this in the TODO comment at line 233 and plans to address it in PR #2984. This is acceptable as a follow-up, but should be prioritized for pruned node deployments.


5. Potential Data Race in Head() Method

Location: pkg/store/store_adapter.go:200-228

func (a *StoreAdapter[H]) Head(ctx context.Context, _ ...header.HeadOption[H]) (H, error) {
    var zero H

    storeHeight, _ := a.getter.Height(ctx)  // ⚠️ Error ignored
    pendingHead, pendingHeight := a.pendingHead()

    // Prefer pending if it's higher than store
    if pendingHeight > storeHeight {
        a.heightSub.SetHeight(pendingHeight)  // ⚠️ Race with concurrent SetHeight calls
        return pendingHead, nil
    }

Problems:

  1. Ignored Error: getter.Height() error is silently ignored, defaulting to storeHeight = 0
  2. SetHeight Race: Multiple concurrent Head() calls can race on SetHeight(), potentially setting stale heights

Recommendation:

  • Check and handle the error from getter.Height()
  • Use compare-and-swap when updating height to avoid going backwards

Medium Priority Issues

6. Missing Context Propagation in Constructors

Location: pkg/store/store_adapter.go:136-158

func NewStoreAdapter[H header.Header[H]](getter StoreGetter[H], gen genesis.Genesis) *StoreAdapter[H] {
    // ...
    var storeHeight uint64
    if h, err := getter.Height(context.Background()); err == nil {  // ⚠️
        storeHeight = h
    }

Problem: Using context.Background() in constructors prevents graceful cancellation during initialization.

Recommendation: Accept a context parameter or use a short timeout context:

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

7. LRU Cache Eviction Not Documented

Location: pkg/store/store_adapter.go:127-128

// pending is an LRU cache for items received via Append that haven't been
// written to the store yet. Keyed by height. Using LRU prevents unbounded growth.
pending *lru.Cache[uint64, H]

Problem: The default size is 1000 items. If P2P receives blocks faster than the syncer processes them, old blocks are silently evicted from pending cache. This could cause GetByHeight() to fail for in-flight blocks.

Recommendations:

  1. Document the eviction behavior and its implications
  2. Consider making cache size configurable
  3. Add metrics for cache evictions
  4. Log warnings when evictions occur

8. Type Assertions Without Error Handling

Location: pkg/sync/sync_service.go:229-233, 266-270, 406-408

// Sync is optional for adapters - they may not need explicit syncing
if syncer, ok := syncService.store.(interface{ Sync(context.Context) error }); ok {
    if err := syncer.Sync(ctx); err != nil {
        return false, err
    }
}

Issue: While this pattern works, it's fragile. If the interface changes, it silently fails.

Recommendation: Define a proper interface in pkg/store:

type Syncer interface {
    Sync(context.Context) error
}

Code Quality & Best Practices

9. Excellent Test Coverage

The PR includes comprehensive test files (data_store_adapter_test.go, header_store_adapter_test.go) with 678 and 606 lines respectively. Tests cover:

  • Basic CRUD operations
  • Pending cache interactions
  • Store precedence logic
  • Edge cases (nil values, context timeouts, partial ranges)
  • Concurrent access patterns

Coverage: 78.73% for the patch is good, though the missing 57 lines should be reviewed.


10. Rollback Simplification - Well Done

Location: apps/evm/cmd/rollback.go:64-69, apps/testapp/cmd/rollback.go:64-70

The rollback commands are significantly cleaner:

// rollback ev-node main state
// Note: With the unified store approach, the ev-node store is the single source of truth.
// The store adapters (HeaderStoreAdapter/DataStoreAdapter) read from this store,
// so rolling back the ev-node store automatically affects P2P sync operations.
if err := evolveStore.Rollback(goCtx, height, !syncNode); err != nil {
    return fmt.Errorf("failed to rollback ev-node state: %w", err)
}

This eliminates the duplicate rollback logic for headers and data stores, reducing code complexity by ~40 lines per app.


11. Documentation Quality

The comments throughout store_adapter.go are excellent:

  • Clear explanation of the adapter pattern (lines 106-113)
  • Detailed description of pending cache behavior (lines 110-113, 434-437)
  • Useful TODOs with issue references (line 233)
  • Function-level documentation for all public methods

Security Considerations

12. Validation of P2P Data

Location: pkg/store/store_adapter.go:434-462

The Append() method adds P2P-received items directly to the pending cache without validation:

func (a *StoreAdapter[H]) Append(ctx context.Context, items ...H) error {
    // ...
    for _, item := range items {
        if item.IsZero() {
            continue
        }
        // ⚠️ No validation of item integrity
        a.pending.Add(height, item)

Analysis: The PR author's comment in review confirms validation happens later during processing (in go-header's session.go:269-285), so this is acceptable. However, consider adding a comment explaining this trust model.


Architectural Review

13. Store Adapter Pattern - Excellent Design

The generic StoreAdapter[H] is well-architected:

  • Type-safe: Uses generics properly with header.Header[H] constraint
  • Composable: Separates concerns via StoreGetter interface
  • Testable: Clean interfaces make mocking easy
  • Reusable: Single implementation for both headers and data

The two-tier caching (pending + store) elegantly handles the timing mismatch between P2P receipt and syncer processing.


14. Sync Service Refactoring - Clean

Location: pkg/sync/sync_service.go:64-86

The removal of exchange_wrapper.go (103 lines) and exchange_wrapper_test.go (181 lines) by using the store adapters directly is a significant simplification:

func NewDataSyncService(
    evStore store.Store,
    conf config.Config,
    genesis genesis.Genesis,
    p2p *p2p.Client,
    logger zerolog.Logger,
) (*DataSyncService, error) {
    storeAdapter := store.NewDataStoreAdapter(evStore, genesis)
    return newSyncService(storeAdapter, dataSync, conf, genesis, p2p, logger)
}

This is much cleaner than the previous wrapper-based approach.


Performance Considerations

15. Pending Cache Iteration Performance

Location: Multiple locations (pendingHead(), Get(), Has(), Height())

Several methods iterate over all pending cache keys:

for _, h := range a.pending.Keys() {
    if item, ok := a.pending.Peek(h); ok && h > maxHeight {
        maxHeight = h
        head = item
    }
}

Analysis: With default size 1000, this is acceptable. However, for very high throughput scenarios (10K+ blocks/sec), consider:

  • Maintaining a separate maxPendingHeight atomic variable
  • Using a sorted data structure for pending items

CI Status

✅ Semantic PR check passed
🔄 Main CI pipeline still running
🔄 PR workflow still running

The CI runs are still in progress, so integration test results are not yet available.


Summary & Recommendations

Must Fix Before Merge

  1. Fix race condition in heightSub.Wait() - Critical for correctness
  2. Add cleanup for cancelled height waiters - Critical for memory safety
  3. Fix error handling in GetByHeight() - Important for debuggability
  4. Handle errors from getter.Height() in Head() - Important for robustness

Should Fix Soon (Follow-up PRs)

  1. Optimize Tail() for pruned nodes (tracked in feat: block Pruning #2984)
  2. Make pending cache size configurable
  3. Add metrics for cache evictions and adapter operations
  4. Consider compare-and-swap for SetHeight() operations

Nice to Have

  1. Document P2P validation trust model in Append()
  2. Define proper interfaces instead of type assertions
  3. Add context parameters to constructors

Conclusion

This PR achieves its primary goal of eliminating data duplication by unifying the store layer. The architecture is sound, the tests are comprehensive, and the code is generally well-written. However, the race condition in heightSub.Wait() is critical and must be fixed before merging to prevent potential deadlocks in production.

Once the critical issues are addressed, this will be a significant improvement to the codebase. The simplified rollback logic and removal of wrapper code demonstrates the value of this refactoring.

Overall Assessment: Approve with required changes. Fix the race condition and memory leak issues, then this is ready to merge.


@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @julienrbrt, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the node's data storage strategy by centralizing all block-related data within the ev-node store. This change aims to prevent data duplication and simplify the overall architecture. By introducing adapter layers, the existing go-header P2P synchronization mechanisms can now operate directly on the unified ev-node store, leading to a more coherent and maintainable system. The modifications also streamline rollback procedures and reduce redundant code paths related to data persistence.

Highlights

  • Unified Store Architecture: The pull request refactors the storage mechanism to use the ev-node store as the single source of truth for both headers and data, eliminating the need for a separate go-header store and reducing data duplication.
  • Introduction of Store Adapters: New HeaderStoreAdapter and DataStoreAdapter implementations are introduced. These adapters wrap the ev-node store, allowing it to satisfy the header.Store interface from the go-header library, thus integrating seamlessly with go-header's P2P infrastructure.
  • Simplified Rollback Logic: The rollback commands (apps/evm/cmd/rollback.go and apps/testapp/cmd/rollback.go) are simplified. They now primarily rely on the ev-node store's Rollback method, removing explicit go-header store operations for headers and data, and improving error handling with errors.Join.
  • Refactored Sync Service Initialization: The pkg/sync/sync_service.go file is updated to use the new store adapters. NewDataSyncService and NewHeaderSyncService now directly accept the ev-node store and wrap it with the appropriate adapter, streamlining the setup of synchronization services.
  • Dependency Clean-up: Imports related to go-header/store and ds.Batching are removed from various files, reflecting the consolidated storage approach and reducing unnecessary dependencies.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request successfully refactors the store integration for go-header by introducing DataStoreAdapter and HeaderStoreAdapter. This change eliminates data duplication by allowing the go-header P2P infrastructure to directly utilize the ev-node store. The rollback commands have been updated to reflect this unified store approach, and comprehensive tests have been added for the new adapter implementations. This is a significant improvement in architecture and efficiency.

@codecov
Copy link

codecov bot commented Jan 30, 2026

Codecov Report

❌ Patch coverage is 78.88889% with 57 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.01%. Comparing base (850a82f) to head (0cc54d0).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pkg/store/store_adapter.go 78.83% 37 Missing and 14 partials ⚠️
pkg/sync/sync_service.go 72.72% 4 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3036      +/-   ##
==========================================
+ Coverage   55.35%   56.01%   +0.66%     
==========================================
  Files         117      117              
  Lines       11689    11843     +154     
==========================================
+ Hits         6470     6634     +164     
+ Misses       4496     4479      -17     
- Partials      723      730       +7     
Flag Coverage Δ
combined 56.01% <78.88%> (+0.66%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@julienrbrt julienrbrt marked this pull request as ready for review January 30, 2026 16:54
// For ev-node, this is typically the genesis/initial height.
// If pruning has occurred, it walks up from initialHeight to find the first available item.
// TODO(@julienrbrt): Optimize this when pruning is enabled.
func (a *StoreAdapter[H]) Tail(ctx context.Context) (H, error) {
Copy link
Member Author

@julienrbrt julienrbrt Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is O(1) for non pruned node, and O(n) for pruned nodes, so we should improve this by saving the pruned tail in state in #2984

var errElapsedHeight = errors.New("elapsed height")

// defaultPendingCacheSize is the default size for the pending headers/data LRU cache.
const defaultPendingCacheSize = 1000
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should leave enough header/data for p2p syncing nodes before they executes the block.
We can think of making it bigger otherwise.

Copy link
Contributor

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need to do writetostoreandbroadcast in the sync loop if we have this ?

@julienrbrt
Copy link
Member Author

julienrbrt commented Feb 1, 2026

Do we still need to do writetostoreandbroadcast in the sync loop if we have this ?

We shouldn't need it indeed 👍

@julienrbrt julienrbrt requested a review from tac0turtle February 2, 2026 11:32
// eliminating the need for a separate go-header store and reducing data duplication.
//
// The adapter maintains an in-memory cache for items received via P2P (through Append).
// This cache allows the go-header syncer and P2P handler to access items before they
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we worried about syncing data that doesnt apply to us then distributing it around?

Copy link
Member Author

@julienrbrt julienrbrt Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could we? We distribute WriteToStoreAndBroadcast is still the one distributing.
EDIT: checking if we don't shortcut header verify.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK

@julienrbrt julienrbrt added this pull request to the merge queue Feb 2, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 2, 2026
@julienrbrt julienrbrt enabled auto-merge February 2, 2026 13:08
@julienrbrt julienrbrt added this pull request to the merge queue Feb 2, 2026
Merged via the queue into main with commit e9b9f39 Feb 2, 2026
26 of 27 checks passed
@julienrbrt julienrbrt deleted the julien/simplify-store branch February 2, 2026 13:35
This was referenced Feb 2, 2026
@claude claude bot mentioned this pull request Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants